Efficient use of hybrid media in cache architectures

ABSTRACT

A multi-tiered cache manager and methods for managing multi-tiered cache are described. Multi-tiered cache manager causes cached data to be initially stored in the RAM elements and selects portions of the cached data stored in the RAM elements to be moved to the flash elements. Each flash element is organized as a plurality of write blocks having a block size and wherein a predefined maximum number of writes is permitted to each write block. The portions of the cached data may be selected based on a maximum write rate calculated from the maximum number of writes allowed for the flash device and a specified lifetime of the cache system.

This application is a continuation of U.S. Ser. No. 12/650,966, filed onDec. 31, 2009, which claims priority to U.S. Ser. No. 61/142,046, filedon Dec. 31, 2008, each of which is incorporated herein by reference.

BACKGROUND

1. Field of the Invention

The present invention relates generally to data storage, and moreparticularly to an architecture and approach for using hybrid media inhigh performance, highly scaleable storage accelerators for computernetworks.

2. Description of Related Art

In computing architectures that use externally attached storage such asNetwork Attached Storage (NAS) or Storage Area Networks (SANs), there isa growing mismatch between the increasing speed of computer servers andthe ability of storage systems to deliver data in a timely fashion. Theinability of storage systems to keep pace with fast servers can causeapplications to stall and result in overall throughput of the systemreaching a plateau or regressing under significant load.

An examination of the root causes of this scalability problem reveals acommon factor related to latency of fetching data from spinning magneticdisk drives and more particularly, associated with rotation and seektime. While drives can deliver large contiguous amounts of data with aninitial latency of 1-5 ms in seek time (moving the drive heads to thecorrect location on disk) frequent access to non-contiguous data can beof the order of ˜40 ms per access. For datasets that involve a lot ofrandomly accessed data (such as relational databases), the drive seektime becomes a major bottleneck in delivering data in a timely fashion.

Traditional attempts to solve this problem include adding a hierarchy ofRAM-based data caches in the data path. This conventional approach isillustrated in FIG. 1. As shown in FIG. 1, when a server computer 110attempts to access data from storage system 102 via a network 120, thereare typically at least three different caches in the overall data path.A hard drive data cache 108 provides about 8 Mbytes of cache, a storagesystem cache 106 provides between about 128 Mbytes and 16 Gbytes, and aserver computer data cache 112 provides between about 100M and 2 Gbytes(typical lightly loaded system). While such caches are generallybeneficial, certain drawbacks remain. For example, the performanceproblems mentioned above still occur when the active data set is beingaccessed randomly or is too large to fit into the caches normallypresent or when the I/O requirements of the dataset exceed thecapabilities of the controller attached to the cache.

There have been a number of attempts to create caching products whichtry to attack this problem through custom hardware solutions. Examplesof this include RAMSAN from Texas Memory Systems, Houston, Tex. and eand n-series products from Solid Data, Santa Clara, Calif. Theseproducts are inadequate because they rely on solid-state disk technologywhich tends to be both expensive and limited in maximum storage size.

Flash memory is a non-volatile computer memory that can be erased andreprogrammed. It is offered in various forms ranging from memory cardsto SATA based drives. Flash memory has unique characteristics which makeusing the devices a challenge in enterprise computing environments. Mostnotably, flash memory supports a limited number of write and/or erasecycles, and exceeding this limit can render the device unusable. Also,the write tolerance of a flash memory can be significantly impacted bythe size of the write operations performed. Flash devices weretraditionally targeted at storage environments where data was notfrequently overwritten. For example, flash memory has been commonly usedas a server boot device where the operating system is written once andinfrequently updated. Cache appliances on the other hand can encounterfrequent media writes, both while serving cache misses (on READS) andwhile processing application WRITES. Also, unlike persistent storage,the contents of a cache device can turn over frequently. Therefore,flash memory has not been viewed as suitable for use in cacheapplications.

BRIEF SUMMARY OF THE INVENTION

Certain embodiments of the invention provide methods for managingmixed-media cache. Data is received for caching and assigned into one ormore blocks. Data may be optionally moved to flash memory from RAM if itis aged and used infrequently. Data may be selected for storing in flashmemory based on factors including the size and the age of the data.Certain embodiments of the invention may also provide a multi-tieredcache system comprising a plurality of cache elements including RAM andflash memory elements and a manager configured to control access to thecache elements.

In certain embodiments, a multi-tiered cache manager causes cached datato be initially stored in the RAM elements and selects portions of thecached data stored in the RAM elements to be moved to the flashelements. Each flash element is organized as a plurality of write blockshaving a block size and wherein a predefined maximum number of writes ispermitted to each write block. The portions of the cached data may beselected based on a maximum write rate calculated from the maximumnumber of writes and a specified lifetime of the cache system. Each ofthe portions of the cached data may be moved to a designated write blockand the portions of the cached data are substantially equal in size tothe size of the designated write block. Each RAM element can beorganized as a plurality of RAM blocks and portions of the cached datacan be moved to flash when no RAM block is available for storing newdata.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a conventional approach to cache management.

FIG. 2 is a general depiction of a cache management system 20 accordingto certain aspects of the invention.

FIG. 3 illustrates operation of a simplified cache manager according tocertain aspects of the invention.

DETAILED DESCRIPTION OF THE INVENTION

Embodiments of the present invention will now be described in detailwith reference to the drawings, which are provided as illustrativeexamples so as to enable those skilled in the art to practice theinvention. Notably, the figures and examples below are not meant tolimit the scope of the present invention to a single embodiment, butother embodiments are possible by way of interchange of some or all ofthe described or illustrated elements. Wherever convenient, the samereference numbers will be used throughout the drawings to refer to sameor like parts. Where certain elements of these embodiments can bepartially or fully implemented using known components, only thoseportions of such known components that are necessary for anunderstanding of the present invention will be described, and detaileddescriptions of other portions of such known components will be omittedso as not to obscure the invention. In the present specification, anembodiment showing a singular component should not be consideredlimiting; rather, the invention is intended to encompass otherembodiments including a plurality of the same component, and vice-versa,unless explicitly stated otherwise herein. Moreover, applicants do notintend for any term in the specification or claims to be ascribed anuncommon or special meaning unless explicitly set forth as such.Further, the present invention encompasses present and future knownequivalents to the components referred to herein by way of illustration.

Certain embodiments of the invention provide systems and methods thatcan efficiently implement cache architectures, appliances andapplications using hybrid media. RAM based clustered cache appliancessuch as those described in U.S. Patent Application Ser. No. 11/365,474(“Method and Apparatus for Providing High-Performance andHighly-Scalable Storage Acceleration,” which is incorporated herein byreference in its entirety) can be used with certain extensions to obtainstorage acceleration. Such appliances can intercept requests betweenapplications and their storage devices and can cache data to improveperformance. Certain aspects of the presently described applicationextend the application and utility of various caching architectures byenabling the use of a variety of forms of media including flash baseddevices.

FIG. 2 is a general depiction of a cache management system 20 accordingto certain aspects of the invention. Cache elements 200, comprising RAMand flash devices are controlled and managed by a media manager 202.Media manager 202 typically identifies and configures cache elements 200and manages access to the elements 200. For example, media manager 202may determine that one of cache elements 200 is a flash device having aminimum block size for writing; accordingly, media manager 202 may setaccess controls for the element that delays writing until asubstantially full block of flash has been accumulated or aggregated.Cache provisioning services 240 allocates cache upon system request andinteracts with media management 202 to identify cache types,availability, etc. Cache directory services 260 and I/O 220 and 280interface with system, servers and clients.

Certain embodiments of the invention employ a plurality of optimizationsto permit use of flash in a cache appliance. A two tier (or hybrid)cache architecture may be provided to permit the use of flash media in acache appliance. In the simplified example depicted in FIG. 3, cachemanager 30 manages two forms of media, RAM 34 and flash 36. Each mediaspace 340 and 360 is typically mapped as a collection of blocks. Blocksizes for RAM space 340 and flash space 360 need not be the same and theblock size for the flash cache space 360 may be selected to delivermaximum WRITE tolerance. For example, the optimal WRITE block size forcurrent flash drives can range between 128 KB to several megabytes,depending on the manufacturer. The block size for RAM space 340, on theother hand, is typically selected to obtain optimal storage efficiency.In one example where the cache is storing many small files less than 1Kbyte, the optimal block size for the RAM space 340 could be 1 KB. Ifthe larger files are being stored, the optimal block size could be 4 KBor larger. In certain embodiments, multiple flash devices 361-363 areused and the block size of each flash device 361-363 can be setindependently to match the ideal I/O size for that device.

Data may be stored in the hybrid cache following a READ cache miss or aWRITE operation and in certain other circumstances. Typically the datais first stored in RAM based cache space 340. Subsequent cacheoperations and requests are serviced from RAM until the RAM based cachespace 340 becomes full. When RAM based cache space 340 is filled, aselection of blocks can be chosen to be de-staged into the flash media36. Alternately, the cache may free certain blocks directly from RAM 34and may de-stage to flash 36. A collection of RAM blocks can be selectedto fill up one flash block based on factors including relative blocksizes of the RAM space 340 and flash space 360. For example, if theblocks in RAM space 340 are 4 Kbytes in size and the blocks in Flashspace 360 are sized at 128 Kbytes, then 32 RAM blocks may be accumulatedto make up one flash block. To ensure maximum lifetime of the flashmedia 36, WRITE operations to flash media 36 can be restricted to unitsof the optimal block size.

In addition to controlling the size of WRITE operations to the flashmedia 36, the hybrid cache can also control the number and rate of WRITEoperations to the flash media 36. This can be accomplished in variousways. One simple technique comprises limiting the total number of WRITEspermitted in a given window of time to a selected, often fixed, number.Another technique comprises remembering the total number of WRITEs tothe flash media 36 performed during the lifetime of flash media 36; theWRITE rate can be increased or decreased as necessary to maximizeefficiency and to meet specified/guaranteed lifetime of flash media 36.Controlling the WRITE rate of the flash media 36 may be determinative ofwhether a WRITE can be performed at any given time. A de-stagingoperation will typically not be performed unless the write rate can bemaintained within specification. Consequently, a miss operation may notpopulate cache and/or certain data may be evicted from cache. Evicteddata can be brought into the cache later and/or as needed during a missoperation. Since the number of WRITE cycles of a flash drive 36 alsovaries with manufacturer, with type of flash used and sometimes withgeneration of flash and/or drive 36, the WRITE rate can be configuredfor different types of flash used in the cache and can be individuallyset for each type flash drive 361-363 in the drive 36.

Hybrid cache can maintain internal data structures that map cachedblocks of files to either the RAM space 340 or the flash space 360.Files need not be stored entirely in RAM 34 or entirely in flash 36. Forexample, different blocks 380 and 382 from file 38 can be stored in bothRAM space 340 and flash space 360. Internal data structures relatingblocks 380 and 382 with their origin in file 38 can be maintained in theRAM space 340 and can optionally be backed up onto the flash media 36.Likewise, blocks 390 and 392 from file 39 may be stored in the samedevice (here flash 36), and relationships of the blocks 390 and 392 maybe maintained by cache manager 30 in RAM space 340 and/or flash space360.

Plural flash based memory units or flash based disk drives 361-363 canbe used to make up the flash cache space 360. Flash WRITE block size canbe optimized for writes even to different devices 361-363. The rate ofWRITES, however, is typically managed to ensure that the writecapability of all available flash devices is utilized. In other words,WRITE operations can be optimally divided across all available units ofthe media according to individual WRITE rates and/or system level WRITErates.

READ operations that hit in the RAM cache media 34 can be serveddirectly from RAM space 340. READ operations that hit in the flash media36 and READ operations that hit partly in flash 36 and partly in RAM 34can be handled in various ways. Each portion of the content can beserved from its resident media 34 or 36 and, optionally, the content inflash space 360 can be staged to RAM space 340. READS from the flashmedia are typically permitted to occur at any block size and do not haveto occur in unit sizes set for the optimal WRITE block size.

In certain embodiments, a hybrid cache system provides methods forselecting data to be transferred from RAM space 340 to flash space 360.The decision on which blocks of data to store in flash may be based onfactors that include: age of data, wherein the age of the data isdefined by the date the data is stored to cache or the time of lastupdate of the data in cache; frequency of access and/or frequency of useof the data in; and, frequency of writing to the data.

For example, a good candidate for storage to flash 36 would be data thatis aged and frequently accessed; i.e. the data is in demand for readaccess for long periods of time. However, files that include sections ofdata that frequently accessed and frequently written may be splitbetween RAM 34 and flash 36 sections of caches such that portions thatare rarely written can be cached in flash 36. Candidates for removalfrom DRAM 34 includes infrequently used aged data. In certainembodiments, such data may be stored in flash 36. Typically frequentlyaccessed, frequently written data is typically cached in RAM 34 and is acandidate for purging. It will be appreciated that user-defined rulesmay be determinative of the disposition of data in RAM 34 when new datais to be cached. In that regard, a set of priorities may be associatedwith various data types, data sources, applications associated with thedata and/or physical location of the servers and applications using thedata.

The location of a particular data block, whether in DRAM 34 or flash 36can be hidden from other components of the system through a blockvirtualization layer. The block virtualization layer maps virtual blocksof a certain size to physical blocks that can be located either in DRAM34 or in flash 36. Other system components need not be aware of theactual location of a block. As needs arise, for example if the DRAM runsout of space, blocks can be transparently migrated from DRAM 34 to flash36, without affecting any other system components which refer to theblock by its virtual address. Blocks can also be transparentlycompressed in either DRAM 34 or flash 36.

Cache objects that are smaller in size than a physical block can bestored on flash media 36 with the assistance of a memory allocator 32. Amemory allocator 32 consumes fixed size (virtual or physical blocks) anddivides them up into smaller size units for consumption by the rest ofthe system. For example, 1K block can be divided up by an allocator 32into units of 128 bytes, 256 bytes, or smaller units. Any suitable knownimplementations of allocators known to the skilled artisan can beadapted for use in accordance with certain aspects of the invention. Forexample, certain library implementations of “malloc” may be used as canother publicly available components such as the Slab Allocator, as areknown in the art.

The hybrid cache as described above is implemented within a single nodeof the cluster cache architecture. All the cluster based techniques ofthe previously cited US patent application still apply. In addition, thehybrid cache architecture can be used in a single node form, without theaddition of any cluster technologies.

Additional Descriptions of Certain Aspects of the Invention

Certain embodiments of the invention provide methods for managingmixed-media cache. Some of these embodiments comprise receiving data forcaching, assigning the received data into one or more blocks, optionallymoving aged data stored in a RAM block to flash memory and storing theone or more blocks in RAM, wherein moving the data stored in the RAMblock includes selecting the RAM block based on factors including thesize of the one or more blocks and the age of the moved data. Certainembodiments of the invention may also provide a multi-tiered cachesystem. Some of these embodiments comprise a plurality of cache elementsincluding RAM and flash elements and a manager configured to controlaccess to the cache elements.

In some of these embodiments, the manager causes cached data to beinitially stored in the RAM elements and selects portions of the cacheddata stored in the RAM elements to be moved to the flash elements. Insome of these embodiments, each flash element is organized as aplurality of write blocks having a block size. In some of theseembodiments, a predefined maximum number of writes is permitted for eachwrite block. In some of these embodiments, the portions of the cacheddata are selected based on a maximum write rate calculated from themaximum number of writes and a specified lifetime of the cache system.In some of these embodiments, each of the portions of the cached data ismoved to a designated write block and the portions of the cached dataare substantially equal in size to the size of the designated writeblock. In some of these embodiments, each RAM element is organized as aplurality of RAM blocks. In some of these embodiments, each of theportions of the cached data is moved when no RAM block is available forstoring new data. In some of these embodiments, RAM blocks areunavailable when they contain cached data. In some of these embodiments,no RAM block is available when no available RAM block is large enough tostore new cached data.

Although the present invention has been described with reference tospecific exemplary embodiments, it will be evident to one of ordinaryskill in the art that various modifications and changes may be made tothese embodiments without departing from the broader spirit and scope ofthe invention. Accordingly, the specification and drawings are to beregarded in an illustrative rather than a restrictive sense.

What is claimed is:
 1. An apparatus comprising: a manager configured tocontrol access to a memory device: the memory device having: firstmemory type elements; and second type elements capable of performing alimited number of write operations, wherein the manager causes data tobe stored in first memory type elements and selects data to be movedfrom the first memory type elements in accordance with a policy thatadjusts a maximum write rate of the selected data being moved from thefirst memory type elements to the second memory type elements based on acurrent number of times that a write operation has been performed tomemory elements of the second memory type elements and a lifetime of thesecond memory type elements.
 2. The apparatus of claim 1, furthercomprising the memory device.
 3. The apparatus of claim 2, wherein thesecond memory type element is a flash memory element.
 4. The apparatusof claim 1, wherein the second memory type element is configured to havea plurality of blocks having a block size and a predetermined maximumnumber of writes is permitted for each block.
 5. The apparatus of claim4, wherein a number of times that a write operation has been performedto selected group of blocks of the plurality of blocks is used todetermine maximum write rate to be permitted for write operations to theselected group of blocks.
 6. The apparatus of claim 1, wherein thelifetime of the second memory type elements is an epoch time duringwhich the second memory type elements meet a predetermined lifetimespecification.
 7. The apparatus of claim 1, wherein the lifetime of thesecond memory type elements is an epoch time during which the secondmemory type elements meet a predetermined lifetime guarantee.
 8. Theapparatus of claim 1, wherein a quantity of data to be moved from thefirst memory element type to the second memory element type is selectedsuch that a substantially optimum block size of the second memoryelement type is written.
 9. The apparatus of claim 1, wherein a quantityof data to be moved from the first memory element type to the secondmemory element type is selected such that an optimum block size of thesecond memory element type is written.
 10. The apparatus of claim 1,wherein the selected data to be moved from the first memory typeelements is discarded when no block of the second memory type elementsis available for writing a block size of the data to be moved from thefirst memory type elements.
 11. The apparatus of claim 1, wherein dataincludes associated data and metadata and the metadata and associateddata is selectively stored in either the first memory type elements orthe second memory type elements.
 12. The apparatus of claim 4 whereinthe first memory type elements have a first block size and the secondmemory type elements have a second block size and data is moved from thefirst memory type elements to the second memory type elements when thereis no available block of memory in the first memory type elements tostore new data.
 13. The apparatus of claim 1 wherein the first memorytype elements are a volatile memory.
 14. The apparatus of claim 1,wherein the second memory type elements are a non-volatile memory type15. The apparatus of claim 14, wherein the non-volatile memory type is aflash memory type.
 16. A method for managing a memory device,comprising: configuring a manager controlling access to a memory devicecomprising: first memory type elements; and second memory type elementscapable of performing a limited number of write operations; to performthe steps of: receiving data for caching; writing the received data intothe first memory type elements; select data stored in the first memorytype elements for writing to second memory type elements based onfactors including a size and an age of the data, wherein a maximum writerate to the second memory type elements is calculated from thepredetermined maximum number of writes and a lifetime of the secondmemory type elements.
 17. The method of claim 1, wherein the selectionof data for writing to second memory type elements in includes a factorselected from data type, data source, user application type, or locationof servers in communication with the memory device.
 17. The method ofclaim 15, further comprising virtualizing the received data such thatthe physical address of data stored in the memory device is hidden froma user.
 18. The method of claim 15, wherein each of the first memorytype elements and the second memory type elements is configured to havea first block size and a second block size, respectively, and when datahaving a size less than a block size of the memory element to which thedata is to be written has been received, storage locations in a physicalblock are allocated to accommodate the data such that a plurality ofwrites of data of less than the physical block size are made to thephysical block.